12 research outputs found

    TTWD-DA: A MATLAB toolbox for discriminant analysis based on trilinear three-way data

    Get PDF
    Three-way trilinear data is increasingly used in chemical and biochemical applications. This type of data is composed of three-way structures representing two different signal responses and one sample dimension distributed among a 3D structure, such as the data represented by fluorescence excitation emission matrices (EMMs), spectral-pH responses, spectral-kinetic responses, spectral-electric potential responses, among others. Herein, we describe a new MATLAB toolbox for classification of trilinear three-way data using discriminant analysis techniques (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], and partial least squares discriminant analysis [PLS-DA]), termed “TTWD-DA”. These discrimination techniques were coupled to multivariate deconvolution techniques by means of parallel factor analysis (PARAFAC) and Tucker3 algorithm. The toolbox is based on a user-friendly graphical interface, where these algorithms can be easily applied. Also, as output, multiple figures of merit are automatically calculated, such as accuracy, sensitivity and specificity. This software is free available online

    Variable selection towards classification of digital images: identification of altered glucose levels in serum

    Get PDF
    classed as 125 mg/dL). Herein, we propose a method to identify control, pre-diabetic, or diabetic simulated and real-world samples based on their glucose levels using classification-based variable selection algorithms [successive projections algorithm (SPA) or genetic algorithm (GA)] coupled to linear discriminant analysis (SPA-LDA and GA-LDA) towards analyzing red–green–blue digital images. Images were recorded after glucose enzymatic reaction, whereby 250 μL of reactant content of samples were captured by using a common cell phone camera. Processing was applied to the images at a pixel level, where 72.2% of the pixels were correctly classified as control, 79.2% as pre-diabetic, and 90.9% as diabetic using SPA-LDA algorithm; and 76.8% as control, 81.4% as pre-diabetic, and 91.7% as diabetic using GA-LDA algorithm in the validation set containing nine simulated samples. Eight real-world samples were measured as an external test set, where the accuracy using GA-LDA was found to be 92%, with sensitivities ranging from 70% to 100 and specificities ranging from 90% to 99%. This method shows the potential of variable selection techniques coupled with digital image analysis towards blood glucose monitorin

    Colourimetric Determination of High-Density Lipoprotein (HDL) Cholesterol using Red-Green-Blue Digital Colour Imaging

    Get PDF
    A rapid, low-cost and sensitive method for quantification of high-density lipoprotein (HDL) cholesterol based on enzymatic colorimetric reactions and digital image analysis was developed. The proposed method was adapted to a 96-microwell enzyme-linked immunosorbent assay (ELISA) plate and imaging acquisition was performed using a conventional desktop scanner. The images were recorded using the red-green-blue (RGB) colour system in which the resolved absorbance for each colour channel was used for multiple linear regression. The regression model presented a root mean squared error of calibration and R2 value of 1.53 mg dL-1 and 0.995, respectively. Prediction was obtained with a root mean square error of prediction of 2.42 mg dL-1 and R2 of 0.993; therefore, showing a good prediction response. A limit of detection of 0.43 mg dL-1 and precision better than 1.72% reinforced these results. This method was compared with a reference methodology using UV-Vis measurements at 500 nm and no statistical difference was observed at a confidence level of 95%; showing its potential for future clinical applications

    Uncertainty estimation and misclassification probability for classification models based on discriminant analysis and support vector machines

    Get PDF
    Uncertainty estimation provides a quantitative value of the predictive performance of a classification model based on its misclassification probability. Low misclassification probabilities are associated with a low degree of uncertainty, indicating high trustworthiness; while high misclassification probabilities are associated with a high degree of uncertainty, indicating a high susceptibility to generate incorrect classification. Herein, misclassification probability estimations based on uncertainty estimation by bootstrap were developed for classification models using discriminant analysis [linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)] and support vector machines (SVM). Principal component analysis (PCA) was used as variable reduction technique prior classification. Four spectral datasets were tested (1 simulated and 3 real applications) for binary and ternary classifications. Models with lower misclassification probabilities were more stable when the spectra were perturbed with white Gaussian noise, indicating better robustness. Thus, misclassification probability can be used as an additional figure of merit to assess model robustness, providing a reliable metric to evaluate the predictive performance of a classifier

    Advances in chemometric control of commercial diesel adulteration by kerosene using IR spectroscopy

    Get PDF
    Adulteration is a recurrent issue found in fuel screening. Commercial diesel contamination by kerosene is highly difficult to be detected via physicochemical methods applied in market. Although the contamination may affect diesel quality and storage stability, there is a lack of efficient methodologies for this evaluation. This paper assessed the use of IR spectroscopies (MIR and NIR) coupled with partial least squares (PLS) regression, support vector machine regression (SVR), and multivariate curve resolution with alternating least squares (MCR-ALS) calibration models for quantifying and identifying the presence of kerosene adulterant in commercial diesel. Moreover, principal component analysis (PCA), successive projections algorithm (SPA), and genetic algorithm (GA) tools coupled to linear discriminant analysis were used to observe the degradation behavior of 60 samples of pure and kerosene-added diesel fuel in different concentrations over 60 days of storage. Physicochemical properties of commercial diesel with 15% kerosene remained within conformity with Brazilian screening specifications; in addition, specified tests were not able to identify changes in the blends’ performance over time. By using multivariate classification, the samples of pure and contaminated fuel were accurately classified by aging level into two well-defined groups, and some spectral features related to fuel degradation products were detected. PLS and SVR were accurate to quantify kerosene in the 2.5–40% (v/v) range, reaching RMSEC < 2.59% and RMSEP < 5.56%, with high correlation between real and predicted concentrations. MCR-ALS with correlation constraint was able to identify and recover the spectral profile of commercial diesel and kerosene adulterant from the IR spectra of contaminated blends

    Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach

    Get PDF
    Motivation: Data splitting is a fundamental step for building classification models with spectral data, especially in biomedical applications. This approach is performed following pre-processing and prior to model construction, and consists of dividing the samples into at least training and test sets; herein, the training set is used for model construction and the test set for model validation. Some of the most used methodologies for data splitting are the random selection (RS) and the Kennard-Stone (KS) algorithms; here, the former works based on a random splitting process and the latter is based on the calculation of the Euclidian distance between the samples. We propose an algorithm called the Morais-Lima-Martin (MLM) algorithm, as an alternative method to improve data splitting in classification models. MLM is a modification of KS algorithm by adding a random-mutation factor. Results: RS, KS and MLM performance are compared in simulated and six real-world biospectroscopic applications using principal component analysis linear discriminant analysis (PCALDA). MLM generated a better predictive performance in comparison with RS and KS algorithms, in particular regarding sensitivity and specificity values. Classification is found to be more wellequilibrated using MLM. RS showed the poorest predictive response, followed by KS which showed good accuracy towards prediction, but relatively unbalanced sensitivities and specificities. These findings demonstrate the potential of this new MLM algorithm as a sample selection method for classification applications in comparison with other regular methods often applied in this type of data. Availability: MLM algorithm is freely available for MATLAB at https://doi.org/10.6084/m9.figshare.7393517.v1. Contact: [email protected]/[email protected]

    Raman spectral discrimination in human liquid biopsies of oesophageal transformation to adenocarcinoma

    Get PDF
    The aim of this study was to determine whether Raman spectroscopy combined with chemometric analysis can be applied to interrogate biofluids (plasma, serum, saliva and urine) towards detecting oesophageal stages through to oesophageal adenocarcinoma (normal/squamous epithelium, inflammatory, Barrett's, low-grade dysplasia [LGD], high-grade dysplasia [HGD], and oesophageal adenocarcinoma [OAC]). The chemometric analysis of the spectral data was performed using principal component analysis (PCA), successive projections algorithm (SPA) or genetic algorithm (GA) followed by quadratic discriminant analysis (QDA). The GA-QDA model using a few selected wavenumbers for saliva and urine samples achieved 100% classification for all classes. For plasma and serum, the GA-QDA model achieved excellent accuracy in all oesophageal stages (>90%). The main GA-QDA features responsible for sample discrimination were: 1012 cm (C-O stretching of ribose), 1336 cm (Amide III and CH wagging vibrations from glycine backbone), 1450 cm (methylene deformation), and 1660 cm (Amide I). The results of this study are promising and support the concept that Raman on biofluids may become a useful and objective diagnostic tool to identify oesophageal disease stages from squamous epithelium to OAC. This article is protected by copyright. All rights reserved. [Abstract copyright: This article is protected by copyright. All rights reserved.

    Potential of mid-infrared spectroscopy as a non-invasive diagnostic test in urine for endometrial or ovarian cancer

    Get PDF
    The current lack of an accurate, cost-effective and non-invasive test that would allow for screening and diagnosis of gynaecological carcinomas, such as endometrial and ovarian cancer, signals the necessity for alternative approaches. The potential of spectroscopic techniques in disease investigation and diagnosis has been previously demonstrated. Here, we used attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy to analyse urine samples from women with endometrial (n=10) and ovarian cancer (n=10), as well as from healthy individuals (n=10). After applying multivariate analysis and classification algorithms, biomarkers of disease were pointed out and high levels of accuracy were achieved for both endometrial (95% sensitivity, 100% specificity; accuracy: 95%) and ovarian cancer (100% sensitivity, 96.3% specificity; accuracy 100%). The efficacy of this approach, in combination with the non-invasive method for urine collection, suggest a potential diagnostic tool for endometrial and ovarian cancers

    Quantification of milk adulterants (starch, H2O2, and NaClO) using colorimetric assays coupled to smartphone image analysis

    Get PDF
    In this paper, a colorimetric method for the detection of milk adulterants using smartphone image analysis is reported. This is based on the reactions to detect hydrogen peroxide, sodium hypochlorite, and starch in milk, where a color variation is observed for each substance. The image analysis was performed by using lab-made apps (PhotoMetrix®, and RedGIM®) based on partial least squares regression with the histograms of the red-green-blue images. The image histograms are automatically calculated using the smartphone camera and processed within the app. The results have shown the capability of this method to predict the concentration of the three adulterants, demonstrating the potential of the use of digital images and smartphone applications associated with chemometric tools. This method presents a fast, low-cost, and portable way to quantify adulterants in Cow milk

    Advances in chemometric control of commercial diesel adulteration by kerosene using IR spectroscopy

    Get PDF
    Adulteration is a recurrent issue found in fuel screening. Commercial diesel contamination by kerosene is highly difficult to be detected via physicochemical methods applied in market. Although the contamination may affect diesel quality and storage stability, there is a lack of efficient methodologies for this evaluation. This paper assessed the use of IR spectroscopies (MIR and NIR) coupled with partial least squares (PLS) regression, support vector machine regression (SVR), and multivariate curve resolution with alternating least squares (MCR-ALS) calibration models for quantifying and identifying the presence of kerosene adulterant in commercial diesel. Moreover, principal component analysis (PCA), successive projections algorithm (SPA), and genetic algorithm (GA) tools coupled to linear discriminant analysis were used to observe the degradation behavior of 60 samples of pure and kerosene-added diesel fuel in different concentrations over 60 days of storage. Physicochemical properties of commercial diesel with 15% kerosene remained within conformity with Brazilian screening specifications; in addition, specified tests were not able to identify changes in the blends’ performance over time. By using multivariate classification, the samples of pure and contaminated fuel were accurately classified by aging level into two well-defined groups, and some spectral features related to fuel degradation products were detected. PLS and SVR were accurate to quantify kerosene in the 2.5–40% (v/v) range, reaching RMSEC < 2.59% and RMSEP < 5.56%, with high correlation between real and predicted concentrations. MCR-ALS with correlation constraint was able to identify and recover the spectral profile of commercial diesel and kerosene adulterant from the IR spectra of contaminated blends
    corecore